Power Analysis of a General Convolution Algorithm Mapped on a Linear Processor Array

نویسندگان

  • Rashindra Manniesing
  • Richard P. Kleihorst
  • André van der Avoird
  • Emile A. Hendriks
چکیده

We explore the energy dissipation of the Linear Processor Array (LPA) as a function of the number of available resources (Processor Units P) within the array. This number P is an important parameter, as it reflects performance, relates parallel processing to energy dissipation, and influences the scaling of the various parts of the LPA architecture (memory, address generator, communication network). To make a comparison of the different design variants for a fixed datawidth possible, we propose a high-level energy dissipation model of the processor, which is based on a detailed analysis of a general convolution algorithm. It is shown that the energy dissipation of the LPA can roughly be described by the relationship Etotal ∼ N/P with N presenting the datawidth in pixels. This relationship is derived from two observations: first, the largest contribution to Etotal is formed by the energy dissipated by the memories, and second, in our model of the LPA, the datawidth of the memories corresponds with the number of pixels N to be processed, which results in an increase of the access rate when P decreases. Furthermore, we have shown that the energy dissipation caused by communication within the LPA, increases with increasing number of resources: the trade-off between communication versus computation in parallel computing. This turns out to be negligible in the total energy dissipation, and we therefore conclude, that the optimum solution is found, when a full number of resources is applied within the LPA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of Field Programmable Gate Array Based Baseband Processor for Passive Radio Frequency Identification Tag (TECHNICAL NOTE)

In this paper, an Ultra High Frequency (UHF) base band processor for a passive tag is presented. It proposes a Radio Frequency Identification (RFID) tag digital base band architecture which is compatible with the EPC C C2/ISO18000-6B protocol. Several design approaches such as clock gating technique, clock strobe design and clock management are used. In order to reduce the area Decimal Matrix C...

متن کامل

Systolic algorithms for the CMU warp processor

CMU is building a 32-bit floating-point systolic array that can cfficicndy perform many essential computations in signal processing like the FFT and convolution. This is a one-dimensional systolic array that in general takes inputs from one end cell and produces outputs at the other end, with data and control all flowing in one direction. We call this particular systolic array the Warp processo...

متن کامل

CMU - CS - 84 - 158 Systolic Algorithms for the CMU Warp Processor

CMU is building a 32-bit floating-point systolic array that can efficiently perform many essential computations in signal processing like the FFT and convolution. This is a one-dimensional systolic array that in general takes inputs from one end cell and produces outputs at the other end, with data and control all flowing in one direction. We call this particular systolic array the Warp process...

متن کامل

A Discrete Singular Convolution Method for the Seepage Analysis in Porous Media with Irregular Geometry

A novel discrete singular convolution (DSC)  formulation  is  presented for the seepage analysis in irregular geometric porous media. The DSC is a new promising numerical approach which has been recently applied to solve several engineering problems. For a medium with regular geometry, realizing of the DSC for the seepage analysis is straight forward. But DSC implementation for a medium with ir...

متن کامل

A parallel Viterbi decoder for block cyclic and convolution codes

We present a parallel version of Viterbi’s decoding procedure, for which we are able to demonstrate that the resultant task graph has restricted complexity in that the number of communications to or from any processor cannot exceed 4 for BCH codes. The resulting algorithm works in lock step making it suitable for implementation on a systolic processor array, which we have implemented on a field...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • VLSI Signal Processing

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2004